The aim of this project is to develop a machine learning model that predicts the loan eligibility of customers based on information provided in their application profile. I will be using open-source data pulled from Kaggle, which can be traced back to an Analytics Vidhya Hackathon and implementing multiple techniques to yield the most accurate model for the problem.
Loans are a important requirement of the modern world, allowing students to cover educational expenses, businesses to grow, and banks to operate. Not to mention home loans, which helps you finance the purchase of or construction of a new home in lieu of a hefty down-payment.
Banks have many factors to consider when determining whether an applicat qualifies for a loan; they would arguably want to automate this process. So, we will be using machine learning techniques to predict whether a customer’s profile is relevant to their eligibility based on characteristics such as Education, Gender, Matrital Status, Income, and more!
The dataset consists of 981 observations and 13 variables: 8 categorical and 5 numerical. 614 of the observations are designated for training, while the rest are for testing. The train and test dataset share the same variables except for Loan_Status. In this project, I intend to use supervised methods such as LDA, QDA, KNN, and Logistic Regression. As such, I will use only the training set, splitting it into a training and testing set.
Now that we know the background and importance of our data, let’s begin! First, I will be performing some initial data manipulation and cleaning, and address missing data. Then, I will perform exploratory data analysis, analyzing each variable one-by-one and consider any relationships with other variables in the dataset. After, we will return to do some final tidying before setting up the models. We’ll perform a train/test split on the data, build a recipe, and set folds for the 10-fold cross-validation to be implemented. We’ll be fitting (and tuning if necessary) Logistic Regression, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Elastic Net Logistic, K-Nearest Neighbors (KNN), and Pruned Decision Tree models before assessing their performance and the quality of their fit. From there, we will select the best model and fit it to our testing data.
We will first load the data, do some tidying and cleaning, look at missing values, and then perform some preliminary analysis on the data set before setting up our models.
First, let’s take a look at our data.
loan_ds <- read.csv("~/Desktop/Third year/PSTAT 131/project/project_data/train.csv")
str(loan_ds)## 'data.frame': 614 obs. of 13 variables:
## $ Loan_ID : chr "LP001002" "LP001003" "LP001005" "LP001006" ...
## $ Gender : chr "Male" "Male" "Male" "Male" ...
## $ Married : chr "No" "Yes" "Yes" "Yes" ...
## $ Dependents : chr "0" "1" "0" "0" ...
## $ Education : chr "Graduate" "Graduate" "Graduate" "Not Graduate" ...
## $ Self_Employed : chr "No" "No" "Yes" "No" ...
## $ ApplicantIncome : int 5849 4583 3000 2583 6000 5417 2333 3036 4006 12841 ...
## $ CoapplicantIncome: num 0 1508 0 2358 0 ...
## $ LoanAmount : int NA 128 66 120 141 267 95 158 168 349 ...
## $ Loan_Amount_Term : int 360 360 360 360 360 360 360 360 360 360 ...
## $ Credit_History : int 1 1 1 1 1 1 1 0 1 1 ...
## $ Property_Area : chr "Urban" "Rural" "Urban" "Urban" ...
## $ Loan_Status : chr "Y" "N" "Y" "Y" ...
A few caveats:
Dependents, a numeric variable, is encoded as character
due to the value “3+”.Credit_History, a categorical variable, is coded as a
numeric variable due to its numerical values 0,1.ApplicantIncome and CoapplicantIncome make
more sense as monthly figures.Now, let’s look at missing values:
colSums(is.na(loan_ds)) ## Loan_ID Gender Married Dependents
## 0 0 0 0
## Education Self_Employed ApplicantIncome CoapplicantIncome
## 0 0 0 0
## LoanAmount Loan_Amount_Term Credit_History Property_Area
## 22 14 50 0
## Loan_Status
## 0
We can see that there are NAs in LoanAmount,
Loan_Amount_term and Credit_History. However,
it is important to note that is.na() does not detect blank
cells in character variables. This can become an issue since some
missing values are not identified.
sapply(loan_ds,function(x) table(as.character(x) =="")["TRUE"])## Loan_ID.NA Gender.TRUE Married.TRUE
## NA 13 3
## Dependents.TRUE Education.NA Self_Employed.TRUE
## 15 NA 32
## ApplicantIncome.NA CoapplicantIncome.NA LoanAmount.NA
## NA NA NA
## Loan_Amount_Term.NA Credit_History.NA Property_Area.NA
## NA NA NA
## Loan_Status.NA
## NA
We also have blanks in Gender, Married,
Dependents, and Self-employed. We should first
transform these blanks into NAs for easy identification, and then
determine what to do about them.
First, we will replace blanks with NA’s.
loan_ds <- read.csv(file="~/Desktop/Third year/PSTAT 131/project/project_data/train.csv",
header=TRUE,na.strings = c("","NA")) Next, we will scale ApplicantIncome and CoapplicantIncome.
loan_ds$ApplicantIncome <- (loan_ds$ApplicantIncome*12)/1000
loan_ds$CoapplicantIncome <- (loan_ds$CoapplicantIncome*12)/1000
# scale into annual incomes;
# and convert them into the same units as LoanAmount (in terms of thousands)Because Loan_ID is unique and not relevant to our analysis, we will remove it.
loan_ds <- loan_ds[,-1]; colnames(loan_ds)## [1] "Gender" "Married" "Dependents"
## [4] "Education" "Self_Employed" "ApplicantIncome"
## [7] "CoapplicantIncome" "LoanAmount" "Loan_Amount_Term"
## [10] "Credit_History" "Property_Area" "Loan_Status"
For ease of analysis, we will convert categorical variables into factors and convert Dependents into a numeric variable.
# convert factors
loan_ds$Gender <- factor(loan_ds$Gender, levels = c("Male","Female"))
loan_ds$Married <- factor(loan_ds$Married, levels = c("Yes","No"))
loan_ds$Education <- factor(loan_ds$Education, levels = c("Graduate","Not Graduate"))
loan_ds$Self_Employed <- factor(loan_ds$Self_Employed, levels = c("Yes","No"))
loan_ds$Property_Area <- factor(loan_ds$Property_Area, levels = c("Rural","Semiurban","Urban"))
loan_ds$Loan_Status <- factor(loan_ds$Loan_Status, levels = c("Y","N"), labels = c("Yes","No"))
# convert credit history into factor
loan_ds$Credit_History <- factor(loan_ds$Credit_History, levels = c(1,0), labels = c("Yes","No"))
# convert Dependents into numeric
loan_ds$Dependents <- recode(loan_ds$Dependents,"3+"="3") %>%
as.integer(loan_ds$Dependents)Let’s start by visualizing the missing data.
vis_miss(loan_ds) # visualize missing dataIt looks like the majority of our variables contain missing values. Let’s produce a summary of the missing values to view the percentage of missingness in each variable, as well as a cumulative sum.
loan_ds %>%
miss_var_summary(add_cumsum = TRUE)There are 149 missing values in our dataset; with
Credit_History comprising the largest amount. Ideally, we
could just remove variables with a lot of missingness. But since we
don’t know which variables will be significant in prediction, removing
variables is inappropriate. Yet, removing all observations with
missingness may lead to biased results, so we want to keep them in the
analysis. As a general rule, we do not want to remove more than 10% of
the overall dataset. We will likely need to perform imputation at a
later step.
Let’s continue exploring our data and come back to this issue later.
The dataset consists of the following variables:
Loan_ID : Unique Loan ID.Gender : Male / Female.Married : Is the applicant married (Yes/No)?Dependents : The number of dependents (1,2,3) an
applicant has.Education : An applicant’s education level
(Graduate/Under Graduate)Self_Employed : Is the applicant self-employed
(Yes/No)?ApplicantIncome : An applicant’s annual income (in
thousands of dollars).CoapplicantIncome : A coapplicant’s annual income (in
thousands of dollars)LoanAmount : The loan amount requested by an applicant
(in thousands of dollars).Loan_Amount_Term : The term of the loan in months.Credit_History : Does the applicant’s credit history
meet the bank’s requirements (Yes/No)?Property_Area : An applicant’s area of residence
(Urban/Semi Urban/Rural).Loan_Status : Whether the loan was approved (Yes/No).
This is the target variable.Before we start running out models, let’s visualize the distributions of our variables and explore potential relationships among predictors. We’ll first check out the response, generate a correlation matrix, then look at the predictors one by one, keeping an eye out for confounds.
First, we will look at the distribution of the response; let’s make a barplot and take a look at proportions.
loan_ds %>%
ggplot(aes(x = Loan_Status)) +
geom_bar() +
theme_grey()loan_ds %>%
select(Loan_Status) %>%
table() %>%
prop.table() ## Loan_Status
## Yes No
## 0.6872964 0.3127036
Approximately 69% of applicants were approved, while only 31% of them
were rejected. This imbalance may make it difficult for the model to
learn to predict Loan_Status accurately, so we’ll need to
upsample or downsample the data at a later step.
First, we’ll create a heatmap of the numeric variables to get an idea
of their relationship. The principal diagonal represents each variable’s
correlation with itself (1, of course), and the color legend on tells us
the strength and direction of correlation for each pair. The plot below
shows a moderate, positive correlation between LoanAmount
and ApplicantIncome, and very little correlation (+/- 0.20)
among the other predictors.
Below, we can see that LoanAmount and
ApplicantIncome have a correlation coefficient of 0.57.
Keep this in mind as we explore the dataset.
loan_ds %>%
select(where(is.numeric)) %>%
na.omit() %>%
cor() %>%
corrplot(method="number")Now, we will analyze our predictors one-by-one to examine their distribution and relationship with the response.
We see that loan amount is right (positively) skewed, with most values between 0-400 thousand. As a result, extremely large outliers may pull the mean upwards. The mean loan amount requested by approved applicants and rejected applicants is about the same. However, there is more variation among the rejected applicants.
require(gridExtra)
plot1 <- loan_ds %>%
na.omit(LoanAmount) %>%
ggplot(aes(x=LoanAmount)) +
geom_histogram(bins=40) +
theme_grey()
plot2 <- loan_ds %>%
na.omit(LoanAmount) %>%
ggplot(aes(Loan_Status, LoanAmount)) +
geom_boxplot(na.rm=T) +
geom_jitter(alpha = 0.1) +
theme_grey()
grid.arrange(plot1, plot2, ncol=2)Recall that loan amount is positively correlated with applicant
income. The first plot shows a slight linear trend between the two
variables, especially in the lower amounts on the left. The second plot
shows a mix of approved and rejected applicants in each income range.
Though it looks like higher income applicants do tend to apply for
higher loans, it is difficult to determine whether having a higher
income improves their chances of getting approved. We should take a
closer look at Applicant Income.
We can see from the plot that applicant incomes are right skewed, with most values between 0-300 thousand. Even with the extremely high outliers omitted, there’s a disrepency between the proportions of the income ranges. Average incomes among approved and rejected applicants are about the same.
plot1 <- loan_ds %>%
filter(ApplicantIncome < 500) %>%
# omit high outliers for ease of visualization
ggplot(aes(x=ApplicantIncome)) +
geom_histogram(fill="bisque",color="white",alpha=0.7, bins=20) +
geom_density() +
geom_rug() +
labs(x = "applicant income") +
theme_minimal()
plot2 <- loan_ds %>%
filter(ApplicantIncome < 500) %>%
ggplot(aes(y=ApplicantIncome,x=Loan_Status, color=Loan_Status))+
geom_boxplot() +
theme_grey()
grid.arrange(plot1, plot2, ncol=2)Upon inspecting the outliers (ApplicantIncome > 500),
we find that: 1) 2 out of 3 applicants have dependents, 2) 3 out of 3
are graduates, and 3) 3 out of 3 have no coapplicant. Assuming
coapplicants are spouses, those with dependents are likely single
fathers. The only rejected applicant 1) lives in a rural area, and 2)
has bad credit history. From these observations, we may infer that
dependents don’t seem to affect loan eligibility much, neither does loan
amount or term. Let’s keep this in mind.
loan_ds %>%
filter(ApplicantIncome > 500)Let’s start by taking a look at its density, grouped by loan status. Both plots show that coapplicant incomes are right skewed, with a mix of high and low incomes in each loan status category. Average coapplicant income for approved applicants is slightly higher than that of rejected applicants.
loan_ds %>%
ggplot(aes(x=Loan_Status, y=CoapplicantIncome, color=Loan_Status)) +
geom_boxplot()An interesting finding is that a lot of coapplicant incomes are 0 (ie., no coapplicant) in both the Yes and No category. I was confident that having a high coapplicnat income would boost approval rates.
Let’s explore this further. The table below shows that 273 out of 614 records have a coapplicant income of 0, or about 44%. This is pretty large given the size of our dataset.
loan_ds %>%
dplyr :: count(CoapplicantIncome == 0) A natural question we may ask is how the presence of a
coapplicant alone (not the numerical value of their income) affects the
chances that a given applicant will be approved. The variable
has_coapp has the value of FALSE if coapplicant income is
0, and TRUE otherwise. From the contingency table, we see that about 72%
of applicants with a coapplicants get approved, while only ~65% of
applicants without a coapplicant get approved. That’s a 7%
difference!
loan_ds %>%
dplyr:: mutate(has_coapp = if_else(CoapplicantIncome != 0,TRUE,FALSE)) %>%
group_by(has_coapp, Loan_Status) %>%
dplyr:: summarise(n=n()) %>%
dplyr::mutate(freq = prop.table(n)) # on average, ~72% of applicants w/ a coapplicant were approved for a loan
# while only ~65% of applicants w/o a coapplicant were approved.Perhaps the presense of a coapplicant is more predictive of loan
status than the numerical value of their income. We should consider
transforming CoapplicantIncome into a factor.
Loan amout term is the term of the loan in months. From the plot below, we see that it has a left skew. This means that its mean (342 months ie., 28.5 years) is lower than its median and mode (both; 360 months ie., 30 years). Because there are so few points on the lower end, the mode is more representative of the center.
loan_ds %>%
na.omit(Loan_Amount_Term) %>%
ggplot(aes(x=Loan_Amount_Term)) +
geom_bar() +
theme_grey()# mfv 360 We now look at Loan_Amount_Term in relation to
Loan_Status Applicants requesting short-term loans seem
more likely to qualify, on average, but we do see a high approval rate
for 360. We should also keep in mind that ~85% of loans have a term of
360, however, so the data may be underrepresenting applicants requesting
alternative loan terms.
loan_ds %>%
na.omit(Loan_Amount_Term) %>%
ggplot(aes(x=Loan_Amount_Term, fill=Loan_Status)) +
geom_bar(position="fill")Dependents is right skewed; most applicants have no dependents.
Approval rates are relatively similar across each number of dependents,
with the highest approval rate for 2. There is no clear pattern; this
indicates that an applicant’s Dependents may not be
influential in determining their Loan_Status.
plot1<-loan_ds %>%
na.omit(Dependents) %>% # should be able to impute this later
ggplot(aes(x=Dependents)) +
geom_bar()
# most applicants have no dependents
# dependents vs. loan status
plot2<-loan_ds %>%
na.omit(Dependents) %>%
ggplot(aes(x=Dependents, fill=Loan_Status)) +
geom_bar(position="fill")
# relatively similar likelihood of approval for each # of dependents
grid.arrange(plot1,plot2,ncol=2)Do applicants with dependents request a larger loan than those
without? Indeed, we can see from the boxplots that individuals with
dependents do, on average, request a larger loan amount!
One thing to note is that there are more male applicants than female applicants (81% vs 19%); thus females may be under represented. A natural question to ask is whether there is bias. From the plots below, we can see that females are indeed less likely (about 8%) to be approved for a loan than males.
prop.table(table(loan_ds$Gender))##
## Male Female
## 0.8136439 0.1863561
# much more males than females.. unrepresentative?
# is there bias in the selection process?
loan_ds %>%
na.omit(Gender) %>% # should be able to impute this later
ggplot(aes(x=Gender, fill=Loan_Status)) +
geom_bar(position="fill")# 2-way contingency table
loan_ds %>%
na.omit(Gender) %>%
group_by(Gender, Loan_Status) %>%
dplyr:: summarise(n=n()) %>%
dplyr::mutate(freq = prop.table(n)) # slight bias; females 8% less likely to be approved than malesSurprisingly, given that most applicants have no dependents, married individuals comprise a majority of our dataset (~60%). Nonetheless, a 60-40 ratio offers a good contrast. Married individuals 10% more likely to be approved for a loan. Given the size of our dataset, a 10% difference is pretty significant!
# distribution
prop.table(table(loan_ds$Married))##
## Yes No
## 0.6513912 0.3486088
# are married individuals more likely to be approved?
loan_ds %>%
na.omit(Married) %>%
ggplot(aes(x=Married,fill=Loan_Status)) +
geom_bar(position="fill")# 2-way contingency table
loan_ds %>%
na.omit(Married) %>%
group_by(Married, Loan_Status) %>%
dplyr:: summarise(n=n()) %>%
dplyr::mutate(freq = prop.table(n)) # given the size of our dataset, 10% is pretty large! Education denotes an applicant’s educational attainment; Graduate or Not Graduate. This may be an oversimplification of the possible levels of education that can be attained; however, a simple encoding is just what we need. Our dataset is comprised of ~80% graduates, which makes sense due to educational loans. An 80-20 ratio, however, is definitely unbalanced. From the barplots, we can see that graduates are more likely to get approved (8% more).
prop.table(table(loan_ds$Education))##
## Graduate Not Graduate
## 0.781759 0.218241
# our data is comprised of ~80% graduates!
# makes sense b/c educational loans, etc.
loan_ds %>%
na.omit(Education) %>%
ggplot(aes(x=Education,fill=Loan_Status)) +
geom_bar(position="fill")# graduates are slightly more likely to get approved
# 2-way contingency table
loan_ds %>%
na.omit(Education) %>%
group_by(Education, Loan_Status) %>%
dplyr:: summarise(n=n()) %>%
dplyr::mutate(freq = prop.table(n)) # graduates are about 8% more likely Self-employed is a categorical variable that indicates if an applicant is self-employed. Only 14% of applicants in the dataset are self-employed. There is no significant difference in the approval rates between self-employed and non self-employed individuals; a slight difference of 4%, with non self-employed individuals having the higher rate.
prop.table(table(loan_ds$Self_Employed))##
## Yes No
## 0.1408935 0.8591065
loan_ds %>%
na.omit(Self_Employed) %>%
ggplot(aes(x=Self_Employed,fill=Loan_Status)) +
geom_bar(position="fill")# 2-way contingency table
loan_ds %>%
na.omit(Self_Employed) %>%
group_by(Self_Employed, Loan_Status) %>%
dplyr:: summarise(n=n()) %>%
dplyr::mutate(freq = prop.table(n)) # slight difference; not self-empoyed 4% more likely to be approvedCredit history is a categorical variable that indicates if an applicant’s credit history satisfies the bank’s requirements (ie., if they have good credit history). Most applicants do have good credit history (~85%; unbalanced). There may be some selection bias, though, since individuals with good credit history may be more inclined to apply in the first place. Unsurprisingly, credit history turns out to be a very important predictor for loan status, given that nearly 80% of applicants with good credit history get approved, whereas only 10% of applicants with bad credit history do.
prop.table(table(loan_ds$Credit_History))##
## Yes No
## 0.8421986 0.1578014
# most applicants have good credit history!
loan_ds %>%
na.omit(Credit_History) %>%
ggplot(aes(x=Credit_History,fill=Loan_Status)) +
geom_bar(position="fill")# 2-way contingency table
loan_ds %>%
na.omit(Credit_History) %>%
group_by(Credit_History, Loan_Status) %>%
dplyr:: summarise(n=n()) %>%
dplyr::mutate(freq = prop.table(n)) # *** VERY important predictor!!
# about 70% more likely to be approved if yes!
# only 10% of applicants w/ bad credit history get approvedProperty area is a categorical variable that indicates the area in which an applicant resides (Urban, Semi Urban, Rural). We have a pretty good mix of applicants from all 3 areas. Semiurban has the highest approval rate, then urban, then rural. With respect to other predictors, we can see that married individuals tend to prefer semiurban areas over urban or rural areas. This is a pretty insightful finding because, as we recall, married individuals had a 10% higher approval rate than non-married individuals. Coupled with a higher approval rate for those with coapplicants, we may just find our target demographic!
prop.table(table(loan_ds$Property_Area))##
## Rural Semiurban Urban
## 0.2915309 0.3794788 0.3289902
# good mix of applicants from all 3 areas
loan_ds %>%
na.omit(Property_Area) %>%
ggplot(aes(x=Property_Area,fill=Loan_Status)) +
geom_bar(position="fill")# 2-way contingency table
loan_ds %>%
na.omit(Property_Area) %>%
group_by(Property_Area, Loan_Status) %>%
dplyr:: summarise(n=n()) %>%
dplyr::mutate(freq = prop.table(n)) # is property area related to any other predictors?
loan_ds %>%
na.omit(Property_Area, Loan_Status, Married) %>%
dplyr:: mutate(has_coapp = if_else(CoapplicantIncome != 0,TRUE,FALSE)) %>%
ggplot(aes(x=Property_Area,fill=Married)) +
geom_bar(position="dodge") +
facet_wrap(~has_coapp)Now that we’ve explored the dataset, we will need to fix some errors before continuing with our analysis. Let’s review these issues:
Credit_History,
Self_Employed, LoanAmount,
Dependents, Loan_Amount_Term,
Gender, and Married. Based on the type of
variable, we will determine which method to use.ApplicantIncome and LoanAmount have some
extreme (high) outliers.Missingness exists in both numerical and categorical data. Therefore,
we will be using the mice package. The MICE (Multivariate
Imputation by Chained Equations) algorithm imputes missing values with
plausible data values inferred from other variables in the dataset.
# install and load
# install.packages("mice")
library(mice)From the missing data table below, we see that the first two variables are missing a large proportion of its values, while the latter five are missing some.
loan_ds %>%
miss_var_summary()Now, we call the mice package. The argument
m indicates the number of multiple imputations; the
standard is m=5. The method argument specifies the
imputation method applied to all variables in the dataset; a separate
method can also be specified for each variable.
We can control the defaultMethod used for 1) numeric
data, 2) categorical data with 2 levels, 3) categorical data with >2
unordered levels, and 4) factor data with >2 ordered levels. I will
choose predictive mean matching for numeric data, logistic regression
for 2-level factors, linear discriminant analysis for unordered factor
data, and proportional odds for ordered factor data.
imp <- mice(loan_ds, m=5, defautMethod = c("pmm","logreg", "lda", "polr"))Here, we can see the actual imputations for
Dependents:
imp$imp$DependentsNow let’s merge the imputed data into our original dataset via the
complete() function.
loan_ds <- complete(imp,5) # I chose the 5th round of data imputationCheck missing data again, we note that there is no missing data after the imputation:
loan_ds %>%
miss_var_summary()Outliers can be tricky. It’s hard to determine if they are data entry errors, sampling errors, or natural variation in our data. If we decide to remove records, however, it may result in information loss. We will assume that the missing values are systematic until proven otherwise.
Looking at LoanAmount, we see that the “extreme” values
are somewhat plausible. Some customers may want to apply for a loan as
high as 650 thousand.
zscore <- (abs(loan_ds$LoanAmount-mean(loan_ds$LoanAmount, na.rm=T))/sd(loan_ds$LoanAmount, na.rm=T))
loan_ds$LoanAmount[which(zscore > 3)]## [1] 650 600 700 495 436 480 480 490 570 405 500 480 480 600 496
Since we have a positive skew, we will perform a log transformation to normalize the data. Now the data looks closer to normal and the effect of extreme outliers are significantly smaller.
loan_ds$LogLoanAmount <- log(loan_ds$LoanAmount)plot1 <- loan_ds %>%
ggplot(aes(x=LoanAmount)) +
geom_histogram(bins=20) +
geom_density()+
labs(title="Histogram for Loan Amount") +
xlab("Loan Amount")
plot2 <- loan_ds %>%
ggplot(aes(x=LogLoanAmount)) +
geom_histogram(bins=20) +
geom_density()+
labs(title="Histogram for Log Loan Amount") +
xlab("Log Loan Amount")
grid.arrange(plot1,plot2,ncol=2)As for ApplicantIncome, we also have a pretty severe
positive skew, so we will perform a log transformation. The data looks
much better.
loan_ds$LogApplicantIncome <- log(loan_ds$ApplicantIncome)plot1 <- loan_ds %>%
ggplot(aes(x=ApplicantIncome)) +
geom_histogram(bins=20) +
geom_density()+
labs(title="Histogram for Applicant Income") +
xlab("Applicant Income")
plot2 <- loan_ds %>%
ggplot(aes(x=LogApplicantIncome)) +
geom_histogram(bins=20) +
geom_density()+
labs(title="Histogram for Log Applicant Income") +
xlab("Log Applicant Income")
grid.arrange(plot1,plot2,ncol=2)Now, we will remove the original variables from our dataset
loan_ds <- select(loan_ds,-LoanAmount) # remove original variable
loan_ds <- select(loan_ds,-ApplicantIncome) # remove original variable Now that we have a better idea of how the variables in our dataset impact loan status, it’s time to set up our models. We will perform our train/test split, create our recipe, then establish 10-fold cross-validation to help with our models.
Before we do any modeling, we will need to (randomly) split our dataset into training / testing data. The reason why we split our data is to avoid overfitting; we will fit the models on the training data, then use those models to make predictions on the previously unseen testing data. The testing set is reserved to be fit only once after the models have “learned” from the training set. From there, we can use error metrics to evaluate each model’s performance. We will use a 70/30 split since our dataset is relatively small and we want to reserve enough data for the testing set. We will set a random seed before our split so that we can replicate our results, and stratify on our response.
set.seed(3450)
loan_split <- initial_split(loan_ds, prop = 0.70,
strata = "Loan_Status")
loan_train <- training(loan_split)
loan_test <- testing(loan_split)
loan_folds <- vfold_cv(loan_train, v = 10, strata = "Loan_Status")Dimensions of our datasets:
dim(loan_train); dim(loan_test)## [1] 429 12
## [1] 185 12
Now that we’ve completed all the premilminary steps, it’s time to build our recipe. Think of it as following a recipe for cut-out cookies. Because we’ll be using a variety of different molds (models), each cookie will look different, but their ingredients will be the same! Inside, they’re all the same flour and sugar and eggs! That’s what this recipe is; a unique mix of ingredients that will be fitted to different molds. Our goal turns into finding the best mold for our particular mix. From there, fitting the best model to our test data is analogous to using a different brand of the essential ingredients (ie., the test data), shaping the dough with our best cookie mold, then putting it into the oven!
In our recipe, we’ll be using 8 out of the 11 original predictors, 2
transformed variables LogLoanAmount and
LogApplicantIncome, plus a new variable
Coapplicant.
We’ll first need to upsample the data. Recall from earlier that our
response was severely imbalanced; if we train our models on an
imbalanced dataset, they can accidently become better at identifying one
level versus another, which is undesirable. Two solutions come to mind:
upsampling or downsampling. Since we have a small dataset,
step_upsample() is the better option. We’ll use
over_ratio=1 so that are equally as many Yes’s as there are
No’s. Because upsampling is intended to be performed on the training set
alone, the default skip option is skip=TRUE. We’ll use
skip=FALSE to make sure that it’s brought the counts to be
equal and then rewrite the recipe without.
Since the numerical values of CoapplicantIncome are too
not related to our response, we’ll transform it into a categorical
variable Coappliant instead to indicate the presence /
absence of a coapplicant. We’ll then scale and center our numeric
predictors, and dummy-code the nominal predictors.
loan_recipe <- recipe(Loan_Status~., data=loan_train) %>%
step_upsample(Loan_Status, over_ratio = 1, skip = FALSE) %>%
step_mutate(Coapplicant = factor(if_else(CoapplicantIncome!=0, "Yes","No",NA))) %>%
step_rm(CoapplicantIncome) %>%
# transform coapplicant income into a factor
# Yes if CoapplicantIncome is not 0, No otherwise.
step_scale(all_numeric_predictors()) %>%
step_center(all_numeric_predictors()) %>% # scale and center
step_dummy(all_nominal_predictors()) # convert into factor prep(loan_recipe) %>% bake(new_data = loan_train) %>%
group_by(Loan_Status) %>%
dplyr :: summarise(count = n())Now we rewrite the recipe with skip=TRUE:
loan_recipe <- recipe(Loan_Status~., data=loan_train) %>%
step_upsample(Loan_Status, over_ratio = 1, skip = TRUE) %>%
step_mutate(Coapplicant = factor(if_else(CoapplicantIncome!=0, "Yes","No",NA))) %>%
step_rm(CoapplicantIncome) %>%
# transform coapplicant income into a factor
# Yes if CoapplicantIncome is not 0, No otherwise.
step_scale(all_numeric_predictors()) %>%
step_center(all_numeric_predictors()) %>% # scale and center
step_dummy(all_nominal_predictors()) # convert into factor We can use prep() to check the recipe to verify it
worked.
prep(loan_recipe) %>%
bake(new_data = loan_train) %>%
kable() %>%
kable_styling(full_width = F) %>%
scroll_box(width = "100%", height = "200px")| Dependents | Loan_Amount_Term | LogLoanAmount | LogApplicantIncome | Loan_Status | Gender_Female | Married_No | Education_Not.Graduate | Self_Employed_No | Credit_History_No | Property_Area_Semiurban | Property_Area_Urban | Coapplicant_Yes |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.2906089 | 0.2439113 | -0.0380982 | 0.1280076 | No | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 2.3077766 | 0.2439113 | 0.3630001 | -0.4959078 | No | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.2587427 | -1.2439388 | No | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -1.0311006 | -0.2761123 | No | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -0.4336240 | 0.9062231 | No | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -0.2256137 | -0.7307845 | No | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 |
| 0.2906089 | 0.2439113 | 0.2766806 | -0.1892989 | No | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -2.5081081 | -1.6238742 | No | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 |
| 0.2906089 | 0.2439113 | 0.8215209 | -0.0165239 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | -1.0819002 | -0.4319064 | No | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 2.3077766 | 0.2439113 | 1.7073157 | 1.6481662 | No | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.5083344 | -1.2645184 | No | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | -0.9333936 | -0.2377547 | No | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -0.1610357 | -0.7047869 | No | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | -0.2924581 | 0.0065739 | No | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| 1.2991928 | 0.2439113 | 0.0491629 | 0.1388776 | No | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2906089 | 0.2439113 | 1.4933433 | 1.5218393 | No | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 0.2906089 | 0.2439113 | 0.9140085 | 0.2431861 | No | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.2256137 | -0.0165239 | No | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | 1.2970804 | 0.4653528 | No | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.0773837 | 0.0314404 | No | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.5685145 | -0.1759075 | No | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2906089 | 0.2439113 | -0.3267810 | -2.1784279 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | -0.7736104 | 0.7217816 | No | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -0.1610357 | 0.8401258 | No | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| 1.2991928 | 0.2439113 | 0.9586210 | -0.1558404 | No | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | 0.2892540 | -0.2495844 | No | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.5576605 | 0.4229599 | No | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 2.3077766 | 0.2439113 | 1.5198003 | -0.0781287 | No | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.8404548 | 0.1280076 | No | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | 1.1751271 | 1.9242907 | No | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 |
| 1.2991928 | 0.2439113 | -0.0232743 | -0.0042094 | No | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.1930510 | 0.6045475 | No | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | 1.1908052 | 1.1781572 | No | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | 1.4186330 | 0.9592790 | No | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 2.2045059 | -1.3884515 | -0.9586106 | No | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 |
| 2.3077766 | -2.6969807 | -0.9097304 | -0.2709415 | No | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | 2.2045059 | -0.4706130 | -0.8949495 | No | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | -1.2711917 | -1.1932851 | No | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 |
| 0.2906089 | -0.7363861 | -1.2998369 | -0.9991058 | No | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.4339972 | 1.5863190 | No | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.3869610 | 0.5444794 | No | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -0.8632678 | 1.4544928 | No | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
| 2.3077766 | -2.6969807 | -3.8836803 | -0.0811618 | No | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -0.0530385 | -0.3163526 | No | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| 2.3077766 | 0.2439113 | -0.0380982 | -0.3110429 | No | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.0633256 | -5.0526513 | No | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 1.2991928 | 0.2439113 | -1.1340916 | 0.2599440 | No | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -1.4813903 | -0.9694858 | No | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |
| 2.3077766 | 0.2439113 | 0.1051925 | 0.1774403 | No | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | 1.2748010 | 0.4706354 | No | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |
| 0.2906089 | -2.6969807 | 0.1326012 | -0.7841576 | No | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | 0.8215209 | -1.4032922 | No | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 |
| 1.2991928 | 0.2439113 | 0.1051925 | -0.3123686 | No | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | 0.7342597 | 2.0456601 | No | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -0.2421062 | 0.5474899 | No | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
| 2.3077766 | 0.2439113 | -0.6261989 | 0.3338633 | No | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | 1.6096087 | 1.8920306 | No | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 2.3077766 | 0.2439113 | -0.0380982 | 0.2388907 | No | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | 2.4796737 | 2.4099457 | No | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | -0.7363861 | -0.5274789 | -0.7307845 | No | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 |
| 1.2991928 | 0.2439113 | -0.0530385 | 0.0662706 | No | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
| -0.7179749 | 2.2045059 | 0.1729886 | 0.3813055 | No | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 |
| 0.2906089 | 0.2439113 | -0.2755258 | -1.1177099 | No | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | -0.6465720 | -1.2879043 | No | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | 2.4796737 | 2.3777914 | No | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.3750182 | -0.8066571 | No | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2906089 | 0.2439113 | 2.0145104 | 1.0931727 | No | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.3267810 | 0.0430231 | No | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 2.3077766 | 0.2439113 | -0.5274789 | -1.2628617 | No | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 |
| 2.3077766 | 0.2439113 | 0.5357652 | -0.7036417 | No | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -1.9875201 | -0.8660048 | No | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | 1.6345093 | 1.1077874 | No | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.4153955 | 0.4043432 | No | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 0.2906089 | 0.2439113 | -1.2711917 | -0.1262294 | No | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 2.2045059 | -0.5083344 | -0.5139802 | No | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | 0.0348942 | 0.5897523 | No | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | 2.2045059 | -0.5860949 | -0.7902057 | No | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | 2.8070259 | 2.3396507 | No | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2906089 | -2.6969807 | -0.3095422 | -0.4829884 | No | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 |
| 1.2991928 | 0.2439113 | -0.5083344 | 0.1223773 | No | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -1.1607336 | -0.4257001 | No | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -0.4520287 | -1.5641298 | No | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 |
| 1.2991928 | -2.6969807 | -2.0293870 | -0.5711002 | No | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 |
| -0.7179749 | -0.7363861 | -1.3289195 | -0.6430543 | No | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.7539995 | 0.8622766 | No | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0.2906089 | -0.7363861 | 0.2892540 | 1.0277957 | No | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.9761775 | 0.5602183 | No | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.5900386 | -1.0194140 | No | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 |
| 0.2906089 | 0.2439113 | 0.3869610 | -0.7307845 | No | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2906089 | 0.2439113 | -0.3267810 | 0.8228797 | No | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | -0.4153955 | 0.6574378 | No | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | -1.6471356 | -0.2636463 | No | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 1.2991928 | 0.2439113 | -0.0832751 | -0.9106157 | No | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.2512820 | -0.0979534 | No | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.5357652 | -0.1929718 | No | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2906089 | -2.6969807 | -0.2755258 | -0.7002114 | No | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 |
| 1.2991928 | 0.2439113 | 0.0205178 | -0.5795142 | No | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 |
| 1.2991928 | 0.2439113 | -0.2256137 | -0.5409852 | No | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2906089 | 0.2439113 | 2.5574342 | 2.2283894 | No | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -1.0563311 | -0.8301180 | No | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | 0.8958667 | 0.6059950 | No | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 |
| 0.2906089 | 0.2439113 | 0.1051925 | -0.0593080 | No | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.5860949 | -0.2449260 | No | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | -2.6969807 | -1.1877535 | -1.0328686 | No | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | 1.6529734 | 0.4887202 | No | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| 2.3077766 | 0.2439113 | -0.1139995 | -0.4770625 | No | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.6635138 | 0.1607100 | No | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.6060414 | -0.3243523 | No | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.4570868 | 0.1597391 | No | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 1.2991928 | 0.2439113 | -0.3267810 | -0.2428133 | No | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 |
| 0.2906089 | 0.2439113 | 0.7735368 | 1.4885402 | No | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -1.2998369 | -0.5564872 | No | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.5274789 | -1.0271890 | No | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 |
| 1.2991928 | 0.2439113 | -0.1610357 | 0.6881251 | No | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.6218756 | -0.8183421 | No | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 |
| -0.7179749 | -2.6969807 | 0.0205178 | -0.2407035 | No | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 |
| 1.2991928 | 0.2439113 | 1.2970804 | 1.1162480 | No | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -5.0951556 | -0.8660048 | No | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -0.4336240 | -0.4323849 | No | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | 0.4455770 | 0.2946915 | No | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 |
| 1.2991928 | 2.2045059 | 0.6839964 | 0.5674086 | No | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 2.3077766 | 0.2439113 | 0.2640236 | 0.6353679 | No | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.0773837 | -0.4706689 | No | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.7090324 | -1.2579024 | No | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2906089 | 0.2439113 | -1.4499042 | -1.2220224 | No | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0.2906089 | 0.2439113 | 0.2125375 | -0.6255580 | No | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 |
| 0.2906089 | -4.2654564 | 0.5247224 | 0.0254388 | No | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 1.2991928 | 0.2439113 | -0.3973397 | -0.6590866 | No | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
| 1.2991928 | -1.7166834 | 0.8590565 | 0.5361694 | No | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 1.2991928 | 0.2439113 | 0.6839964 | 0.6299185 | No | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
| 2.3077766 | -2.6969807 | 1.8780151 | -3.5072290 | No | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | -2.6969807 | -1.4813903 | -0.8520528 | No | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | 0.0348942 | 0.1280076 | No | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.5083344 | 0.4975528 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -1.2998369 | -0.5139802 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -0.1610357 | -0.7407231 | Yes | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | 0.1461591 | 0.5361694 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -0.6060414 | -0.8949495 | Yes | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| 1.2991928 | 0.2439113 | 0.4798999 | -0.0758578 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 1.2991928 | 0.2439113 | -1.1877535 | -0.4162014 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 1.2991928 | 0.2439113 | 0.8120203 | -0.4775555 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | -0.0832751 | 0.2447172 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2906089 | -1.7166834 | -0.5083344 | -0.2394390 | Yes | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | -2.6969807 | -0.2421062 | -0.7307845 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| 0.2906089 | 0.2439113 | 1.6773171 | 0.5247638 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | 0.7243126 | 1.2419205 | Yes | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.1295495 | -0.6190487 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 1.2991928 | 0.2439113 | -0.1610357 | -0.1759075 | Yes | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.1862630 | -0.2098172 | Yes | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.6531892 | -0.0165239 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | -1.9465536 | -1.2879043 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | -1.0563311 | -0.8520528 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -0.5860949 | 0.1645873 | Yes | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| -0.7179749 | -2.6969807 | -0.7518402 | -0.3199028 | Yes | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2906089 | 0.2439113 | -2.0721949 | 0.4448411 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | 0.1862630 | 0.4902826 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | 0.1862630 | -0.0781287 | Yes | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.5083344 | -1.1838259 | Yes | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.1610357 | -0.4711597 | Yes | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.5860949 | -1.0314467 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2906089 | 0.2439113 | 0.6113223 | 0.9870961 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 1.2991928 | 0.2439113 | 0.1862630 | -0.3436354 | Yes | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | -0.1610357 | -0.7902057 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | -0.7363861 | -0.9097304 | -0.2804349 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.3095422 | -0.1376105 | Yes | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2906089 | -1.7166834 | -1.8286890 | -0.0826807 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | 0.0773837 | -0.8736695 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 1.2991928 | 0.2439113 | -0.5274789 | -0.3545057 | Yes | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.4336240 | -0.7902057 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.5576605 | 0.4571282 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.0060320 | -0.5353407 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.1295495 | -0.0285728 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | -3.6772780 | -3.1490437 | -0.2293611 | Yes | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -1.8286890 | -1.1458023 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | -2.6969807 | -0.2421062 | -0.0085443 | Yes | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.0060320 | -1.3228128 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 1.2991928 | -2.6969807 | 0.0348942 | 0.0272064 | Yes | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | 0.2766806 | 0.2116086 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2906089 | 0.2439113 | 0.3869610 | -0.1494746 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2906089 | 0.2439113 | -0.5083344 | -0.4879444 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 1.2991928 | 0.2439113 | 1.0363816 | 1.5108650 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 2.3077766 | 2.2045059 | -0.6261989 | -1.0754269 | Yes | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.0773837 | 0.3527908 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | 0.1189462 | -0.5502674 | Yes | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2906089 | 0.2439113 | 0.3509057 | 0.8643113 | Yes | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.1930510 | 0.2750192 | Yes | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
| 0.2906089 | -2.6969807 | 0.6635138 | 1.8816769 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.3141546 | -0.4319064 | Yes | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2906089 | 0.2439113 | -0.8179117 | -0.9742684 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | 1.3044493 | 1.3706797 | Yes | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | 0.6113223 | 0.4496609 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -2.0721949 | -0.0165239 | Yes | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.0913389 | -1.0278978 | Yes | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 |
| 1.2991928 | 0.2439113 | -0.9097304 | -0.5358530 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.7539995 | 0.0314404 | Yes | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| 2.3077766 | 0.2439113 | 1.9838684 | 2.6239813 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.2092628 | -0.1301443 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -1.0819002 | -1.1177099 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | -1.1877535 | -0.6635394 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | -3.6772780 | -3.1490437 | -0.2982873 | Yes | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.4706130 | 0.2277892 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.4798999 | 0.3813055 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 1.2991928 | -2.6969807 | 0.6839964 | 1.5553245 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -0.5083344 | -0.8968989 | Yes | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 1.2991928 | 0.2439113 | -1.1877535 | 1.9107965 | Yes | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2906089 | 0.2439113 | -2.8017450 | -1.5262267 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | 0.2892540 | -0.7442464 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2.3077766 | -2.6969807 | 2.9047329 | 3.4103549 | Yes | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
| 0.2906089 | 0.2439113 | 0.3869610 | 0.5361694 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.1610357 | -0.5241143 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.5468179 | -1.2357846 | Yes | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 1.2991928 | 0.2439113 | 0.3630001 | -0.0161603 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | -2.6969807 | -1.0563311 | 1.2038879 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 1.2991928 | 0.2439113 | 0.8120203 | 0.9720209 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.6060414 | -1.2711633 | Yes | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.2256137 | -0.2817340 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | -2.6969807 | -1.1078172 | 0.1336170 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.3617337 | -0.2272699 | Yes | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -1.9064496 | 0.2246909 | Yes | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.2755258 | -0.3172394 | Yes | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2906089 | 0.2439113 | -0.7090324 | -0.7307845 | Yes | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 1.2991928 | 0.2439113 | 0.4570868 | 0.2364307 | Yes | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 |
| 0.2906089 | 0.2439113 | 0.0633256 | -0.2804349 | Yes | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| 1.2991928 | 0.2439113 | -0.0985753 | -0.1098966 | Yes | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.1610357 | 0.0690227 | Yes | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.9333936 | -0.3955114 | Yes | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -1.6471356 | -0.0876275 | Yes | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -1.5134057 | -0.8029858 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2906089 | 0.2439113 | 1.1593190 | 0.9311361 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 2.3077766 | 0.2439113 | -0.0085648 | 0.4592549 | Yes | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.0913389 | -0.4290388 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | 0.0060320 | -0.1852284 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.1862630 | -0.5286969 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.5576605 | 0.8899172 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.0380982 | 0.5980166 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 1.2991928 | 0.2439113 | 0.9049592 | 0.5980166 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.8120203 | 0.6339482 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0.2906089 | 0.2439113 | -0.4336240 | -0.7956697 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 2.2045059 | -0.1610357 | -0.4214183 | Yes | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | 1.4529559 | 1.0338075 | Yes | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2906089 | 0.2439113 | 0.1326012 | -0.4376579 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2906089 | 0.2439113 | 0.5024430 | 0.4043432 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.1295495 | -0.2965363 | Yes | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
| 0.2906089 | 0.2439113 | -0.5860949 | -0.3627099 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | -4.6575753 | 0.3264839 | -0.5317596 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 |
| 1.2991928 | 0.2439113 | -0.3617337 | 0.5980166 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.1139995 | -0.3216810 | Yes | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 1.2991928 | 0.2439113 | -0.2924581 | 1.2644150 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2906089 | -2.6969807 | 0.0913389 | 0.6824040 | Yes | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |
| 0.2906089 | 2.2045059 | -0.1139995 | -0.5747756 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.7090324 | -1.2803480 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2906089 | -2.6969807 | 1.4461406 | -0.6956497 | Yes | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 2.3077766 | 0.2439113 | 1.2748010 | -0.4726333 | Yes | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0.2906089 | 0.2439113 | 1.7659317 | 0.5569839 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | 0.0491629 | -1.0834954 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.1610357 | 0.8123949 | Yes | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.0380982 | 0.3094265 | Yes | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
| 1.2991928 | 0.2439113 | 0.2640236 | 0.1687765 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 2.3077766 | 0.2439113 | 0.3869610 | 0.0430231 | Yes | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 |
| 0.2906089 | 0.2439113 | -0.7090324 | -0.5779331 | Yes | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | -2.8017450 | -0.3987842 | Yes | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -0.0680968 | -0.6007560 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.2640236 | -0.7191749 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2906089 | 0.2439113 | -0.2421062 | -0.6458064 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.8775505 | -0.0807824 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.9333936 | -0.4628366 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | 2.2965322 | 1.8816769 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.2924581 | -0.1060337 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 1.2991928 | -2.6969807 | -0.9816207 | -0.9158740 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | -0.7303160 | -0.4362180 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.1610357 | -0.1946070 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.2421062 | -0.0318756 | Yes | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.6839964 | 1.2336572 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.1189462 | -0.4765697 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 1.2991928 | 0.2439113 | 0.0491629 | -0.4386186 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2906089 | 0.2439113 | 0.0633256 | 0.1552002 | Yes | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2906089 | 0.2439113 | -0.4153955 | -0.5784599 | Yes | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.0205178 | 0.2599440 | Yes | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.1326012 | -0.0781287 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.0633256 | -0.1710671 | Yes | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| 1.2991928 | 0.2439113 | 0.6635138 | 0.9076070 | Yes | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.8404548 | -0.5564872 | Yes | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -1.6128127 | -0.7859695 | Yes | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 1.2991928 | -4.2654564 | 0.1862630 | 0.4706354 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1.2991928 | 0.2439113 | -0.1610357 | -0.1892989 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2906089 | 0.2439113 | -0.2924581 | -0.2627903 | Yes | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| 1.2991928 | 0.2439113 | -1.2998369 | -0.2373339 | Yes | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.0205178 | -0.4323849 | Yes | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.0085648 | 0.2345831 | Yes | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 |
| 0.2906089 | 0.2439113 | -0.2092628 | 0.0607513 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | -0.5468179 | -0.5471670 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -1.1877535 | 0.4051694 | Yes | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| 1.2991928 | 0.2439113 | -0.0832751 | -0.7407231 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.3869610 | -0.7908118 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 1.2991928 | 0.2439113 | -0.0680968 | -0.4726333 | Yes | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | 1.3044493 | 0.6180040 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | 1.0616120 | 1.1643756 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 2.3077766 | 0.2439113 | -0.6060414 | -0.6928057 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.0085648 | -0.8376028 | Yes | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | -2.6969807 | -0.2256137 | -0.1506662 | Yes | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 1.2991928 | 0.2439113 | 0.4455770 | 1.0338075 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 2.3077766 | 0.2439113 | 0.8120203 | 0.3097198 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 1.2991928 | 0.2439113 | 1.1273036 | 0.2599440 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.0085648 | 0.1822323 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | 0.1461591 | 0.7306910 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0.2906089 | 0.2439113 | 0.5576605 | 0.6160889 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -1.6471356 | 1.9991765 | Yes | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
| 1.2991928 | -0.7363861 | 2.0346680 | 1.2038879 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.3267810 | -0.6381130 | Yes | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | -0.0085648 | 0.0349597 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 1.2991928 | 0.2439113 | -0.0085648 | 0.3740163 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -0.0380982 | -0.3545057 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.3387339 | 0.5361694 | Yes | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2906089 | -2.6969807 | -0.2755258 | -0.2098172 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | 0.0773837 | -0.8949495 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 2.3077766 | 0.2439113 | 0.6635138 | 1.1783224 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 2.3077766 | 0.2439113 | 1.4186330 | 1.2330197 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.1610357 | -0.7407231 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 1.2991928 | -2.6969807 | -0.2755258 | -1.1335903 | Yes | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 1.2991928 | 0.2439113 | -0.2755258 | -0.4643021 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 1.2991928 | 0.2439113 | 0.0633256 | -0.3806397 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.6060414 | -0.4974057 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | -0.4520287 | -0.7902057 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -1.7176944 | -1.1565692 | Yes | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.2421062 | -1.0067769 | Yes | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | 0.3869610 | 0.5994704 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -0.3267810 | -0.9478129 | Yes | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -1.4813903 | -0.5165074 | Yes | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -0.2924581 | -2.1410175 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.1051925 | -0.3945776 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.1051925 | 0.4923634 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.5083344 | -1.2711633 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | -0.5860949 | 1.9122226 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.1452275 | -0.6928057 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 1.2991928 | 0.2439113 | 0.0348942 | 0.5196669 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.7736104 | -0.5549298 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2906089 | 0.2439113 | 0.2640236 | 1.0858121 | Yes | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | 2.1560382 | 1.6930668 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.1729886 | -0.8363527 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.5083344 | -0.1759075 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | -1.7166834 | -0.5083344 | 1.3171976 | Yes | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -1.8286890 | -1.1853984 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.3794535 | -0.9749528 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | 0.1051925 | 0.8001410 | Yes | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.6113223 | 0.9531552 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 2.3077766 | 0.2439113 | 0.2384545 | 0.1684547 | Yes | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 |
| 0.2906089 | 0.2439113 | 0.2892540 | 3.3214361 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | -0.7363861 | 0.5576605 | 0.8401258 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 1.2991928 | 0.2439113 | -0.3267810 | 0.1506475 | Yes | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | -2.6969807 | -0.7090324 | -0.8005432 | Yes | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.8404548 | 0.7880009 | Yes | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
| 1.2991928 | 0.2439113 | 0.0773837 | 0.0503481 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.3869610 | -1.0666736 | Yes | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.3017450 | -0.5064238 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2906089 | 0.2439113 | 0.0205178 | 0.3167407 | Yes | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.1326012 | 0.5361694 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0.2906089 | 0.2439113 | 0.4106243 | -0.3022345 | Yes | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| 2.3077766 | 0.2439113 | -0.5083344 | 0.1632960 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.6465720 | -0.7902057 | Yes | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
| 1.2991928 | 0.2439113 | 0.4106243 | 0.4131325 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| 1.2991928 | 0.2439113 | 0.2640236 | 2.0710848 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 0.2906089 | 0.2439113 | -0.7956322 | 2.0840480 | Yes | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.3141546 | 0.0489556 | Yes | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| 2.3077766 | 0.2439113 | -0.2755258 | -0.4667477 | Yes | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | 1.1110919 | 1.4313153 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 1.2991928 | 0.2439113 | 1.2063552 | 1.0338075 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 1.2991928 | 0.2439113 | 0.3869610 | 0.1280076 | Yes | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.5860949 | -0.6741674 | Yes | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 |
| 0.2906089 | 0.2439113 | 0.6737826 | 0.3576281 | Yes | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | -0.3267810 | -0.1999337 | Yes | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | 1.0363816 | 2.1372788 | Yes | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 |
| 0.2906089 | -4.2654564 | -0.4153955 | -0.3576910 | Yes | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | -5.4418132 | -0.3095422 | 0.1418287 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 2.2045059 | -0.2755258 | -2.8427830 | Yes | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| 1.2991928 | 0.2439113 | 0.8867306 | 0.2184753 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | -0.7363861 | -0.0985753 | -0.1502689 | Yes | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 1.2991928 | 0.2439113 | 1.1829823 | -0.2761123 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | 2.4796737 | 2.4220661 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | 0.6941558 | -0.8029858 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| 0.2906089 | 0.2439113 | 1.2370796 | 0.5524942 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 2.3077766 | 0.2439113 | 0.2640236 | -0.2952244 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 1.2991928 | 2.2045059 | -0.2755258 | -1.1049752 | Yes | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -1.6471356 | -0.7902057 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| 2.3077766 | 0.2439113 | -0.5083344 | 0.4523320 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 1.2991928 | 0.2439113 | 2.4796737 | 0.9621410 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 0.2906089 | 0.2439113 | 0.3988294 | 0.3439301 | Yes | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.0085648 | 0.7220049 | Yes | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 |
| 2.3077766 | 0.2439113 | -0.5083344 | 0.0247312 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -0.0832751 | 2.2144418 | Yes | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 |
| 0.2906089 | -2.6969807 | 0.1326012 | 0.1362496 | Yes | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2906089 | 0.2439113 | -0.6060414 | -0.2160272 | Yes | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2906089 | 0.2439113 | -0.0380982 | -0.9572567 | Yes | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | -0.4706130 | -0.5054191 | Yes | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.3264839 | -0.6359220 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.1994456 | -0.7745305 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.3267810 | -0.6266457 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2906089 | 0.2439113 | -0.4520287 | -0.7950616 | Yes | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| 0.2906089 | 0.2439113 | 0.3630001 | -0.3545057 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2906089 | 0.2439113 | -3.0743334 | 0.3955027 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.8404548 | -0.6922375 | Yes | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 |
| 2.3077766 | 0.2439113 | 1.3117899 | 1.3310077 | Yes | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.4106243 | -0.1215449 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.6323708 | -0.0114411 | Yes | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 1.2991928 | 0.2439113 | -0.3617337 | -0.2065155 | Yes | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 0.2906089 | 0.2439113 | 2.9047329 | 2.3206419 | Yes | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | 0.9140085 | 0.5166005 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | 0.0205178 | 0.4848070 | Yes | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.1610357 | 0.0891667 | Yes | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
| -0.7179749 | 0.2439113 | -1.1877535 | -0.3545057 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 1.2991928 | 0.2439113 | 1.4186330 | 2.0839571 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | -0.3267810 | -0.4025331 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7179749 | 0.2439113 | -1.1877535 | -1.0081758 | Yes | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | 0.2439113 | -0.6261989 | 0.1822323 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7179749 | -2.6969807 | -1.6128127 | -0.5139802 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 2.3077766 | 0.2439113 | 1.5328922 | 1.2094031 | Yes | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
| -0.7179749 | -2.6969807 | 0.1596211 | -0.1324981 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | 1.3117899 | 2.0334910 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 1.2991928 | 0.2439113 | 0.7342597 | 0.4795737 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | 0.3264839 | -0.5684804 | Yes | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| 2.3077766 | 0.2439113 | -0.0380982 | 0.4592549 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7179749 | 0.2439113 | 0.5247224 | -0.2061033 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2906089 | 0.2439113 | 2.5421340 | 1.5863190 | Yes | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2906089 | 0.2439113 | 0.5357652 | -0.3243523 | Yes | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 1.2991928 | 0.2439113 | 0.3509057 | -0.0830606 | Yes | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -0.3617337 | -0.4011262 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7179749 | 0.2439113 | -1.1607336 | -0.5653426 | Yes | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 2.3077766 | -2.6969807 | -2.2537483 | -0.0385028 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 1.2991928 | 0.2439113 | 0.6839964 | 0.8909165 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
Notice that, by dummy-coding the nomial predictors, we’ve increased
the number of columns in our dataset. This is because each factor has
been transformed into k-1 dummy variables, with one level held out as
the reference (or baseline) level. The baseline level is not visible in
our dataset and assigned a value of 0. For a given predictor, if the
dummy variables corresponding to every other level is 0, then we default
to the baseline. For instance both Property_Area_Urban and
Property_Area_Semiurban are 0, then the applicant must be
from a Rural area.
We will stratify on our response variable Loan_Status
and use 10 folds to perform stratified cross validation. K-fold
cross-validation divides our data into k folds of roughly equal sizes,
holds out the first fold as a validation set, and fits the model on the
remaining k-1 folds as if they were the training set. This is repeated k
times; each time, a different fold is used as a validation set. This
results in k estimates of the test MSE (or in the classification case,
test error rate).
loan_folds <- vfold_cv(loan_train, v = 10, strata = Loan_Status)To save computational time, we will save the results to an RDA file; once we have the model we want, we can go and load it later with no time commitment.
save(loan_ds, loan_folds, loan_recipe, loan_train, loan_test,
file = "~/Desktop/Third year/PSTAT 131/project/rda_files/loan-setup.rda")It’s time to build our models! For ease of efficiency and access, I will be building each model in a separate R file and saving my results in RDA files. The models will then be loaded below for further exploration. This allows us to streamline our analysis and save on computational time.
For each model, we will:
For models requiring parameter tuning, we’ll complete steps 3-5.
grid_regular to set up tuning grids of values for
the parameters we’re tuning and specify levels for each.tune_grid().roc_auc and finalize the workflow.Afterwards, we’ll load back in the saved files, collect error metrics, and analyze their individual performances.
The performance metric we’ll be using is roc_auc, which
stands for area under the ROC curve. The ROC (receiver operating
characteristics) curve is a popular graphic that plots true positive
rate (TPR) vs. false positive rate (FPR) at various threshold settings.
TPR is sensitivity (proportion of observations that are
correctly classified), while FPR is 1-specificity (proportion
of observations that are incorrectly classified); the higher the TPR,
the beter. The AUC (area under curve) is a measure of the diagostic
ability of a classifier, highlighting the trade-off between sensitivity
and specificity.
It’s time to load our models back in to evaluate their results!
load(file= "~/Desktop/Third year/PSTAT 131/project/rda_files/logistic.rda")
load(file= "~/Desktop/Third year/PSTAT 131/project/rda_files/knn.rda")
load(file= "~/Desktop/Third year/PSTAT 131/project/rda_files/en.rda")
load(file= "~/Desktop/Third year/PSTAT 131/project/rda_files/lda.rda")
load(file= "~/Desktop/Third year/PSTAT 131/project/rda_files/qda.rda")
load(file= "~/Desktop/Third year/PSTAT 131/project/rda_files/decision-tree.rda")Here, we will visualize the results of our tuned models. We will use
the autoplot function to visualize the effect of varying
select parameters on the performance of each model according to its
impact on our metric of choice.
For the KNN model, we had 10 different levels of
neighbors. In general, the higher the number of
neighbors, the greater the roc_auc. The
roc_auc score of the best performing model (k=10) is
approxmiately 0.71, which is pretty decent.
autoplot(knn_tune_res)In our elastic net model, we tuned 2 parameters with 10 levels of
each: penalty, the amount of regularization, and
mixture, the proportion of lasso penalty (1 for pure lasso,
0 for pure ridge). We can see from the graph that the optimal mixture
was 0 (ie., pure ridge model), lower levels of mixture resulted in
higher roc_auc scores, and that models performed worse as
penalty increased.
autoplot(en_tune_res)For our decision tree model, we focused on the parameter
cost_complexity and tuned it with 10 levels. Oftentimes
decision trees can have too many splits, leading to a very complex model
that is likely to overfit the data. A smaller tree with fewer splits can
address this issue by yielding a simpler model (better interpretation,
more bias).
The idea of cost-complexity pruning is similar to that of lasso /
ridge regularization; first, we grow a very large tree, then consider a
sequenced of pruned subtrees and select the one that minimizes a
penalized error metric. The tuning parameter
cost_complexity controls a trade-off between a subtree’s
complexity and its fit to the training data; when
cost_complexity is 0, it’s the same as the the training
error rate; as cost_complexity increases, the tree is
penalized for having too many nodes.
We can see from the plot below that a cost-complexity of about 0.001
yields the optimal model. This indicates that our tree does not require
pruning / penalization after all. Note that the parameter uses the
log10_trans() functions by default, so all of the values in
our grid are in the log10 scale.
autoplot(dt_tune_res)Here, we will compare the performance of each model on the training
data and create visualization. I’ve created a tibble in order to display
the estimated testing roc_auc scores for each fitted
model.
log_auc <- augment(log_fit, new_data = loan_train) %>%
roc_auc(truth = Loan_Status, estimate = .pred_Yes) %>%
select(.estimate)
lda_auc <- augment(lda_fit, new_data = loan_train) %>%
roc_auc(truth = Loan_Status, estimate = .pred_Yes) %>%
select(.estimate)
qda_auc <- augment(qda_fit, new_data = loan_train) %>%
roc_auc(truth = Loan_Status, estimate = .pred_Yes) %>%
select(.estimate)
knn_auc <- augment(knn_final_fit, new_data = loan_train) %>%
roc_auc(truth = Loan_Status, estimate = .pred_Yes) %>%
select(.estimate)
en_auc <- augment(en_final_fit, new_data = loan_train) %>%
roc_auc(truth = Loan_Status, estimate = .pred_Yes) %>%
select(.estimate)
dt_auc <- augment(dt_final_fit, new_data = loan_train) %>%
roc_auc(truth = Loan_Status, estimate = .pred_Yes) %>%
select(.estimate)
roc_aucs <- c(log_auc$.estimate,
lda_auc$.estimate,
qda_auc$.estimate,
knn_auc$.estimate,
en_auc$.estimate,
dt_auc$.estimate)
mod_names <- c("Logistic Regression",
"LDA",
"QDA",
"KNN",
"Elastic Net",
"Decision Tree")mod_results <- tibble(Model = mod_names,
ROC_AUC = roc_aucs)
mod_results <- mod_results %>%
dplyr::arrange(-roc_aucs)
mod_resultsWhile all of our models did relatively well, the best-performing
model is the KNN model with an roc_auc score of 0.928, with
the decision tree close behind at approximately 0.88. I’ve created a
lollipop plot below to help visualize these results.
lp_plot <- ggplot(mod_results, aes(x = Model, y = ROC_AUC)) +
geom_segment( aes(x = Model, xend = 0, y = ROC_AUC, yend = 0)) +
geom_point( size=7, color= "black", fill = alpha("blue", 0.3), alpha=0.7, shape=21, stroke=3) +
labs(title = "Model Results") +
theme_minimal()
lp_plotNow that we’ve identified our best models, we can continue to further analyze their true performance. We will start with the KNN model and also analyze the performance of the decision tree and QDA model as a means of comparison.
So, the KNN model performed the best overall, but which of value of
neighbors yields the best performance?
# select metrics of best knn model
knn_tune_res %>%
collect_metrics() %>%
dplyr::arrange(mean) %>%
slice(10)KNN model #10 with 11 predictors, 10 neighbors, and a mean
roc_auc score of 0.717 performed the best! Now that we have
our best model, we can fit it to our testing data to explore its true
predictive power.
The KNN model, with a final roc_auc score of 0.715, did
a pretty decent job. In general, an AUC value between 0.7-0.8 is
considered acceptable. Our results indicate that KNN may not be our best
choice. However, given the complex nature of our problem, I would
classify this as a win.
knn10_roc_auc <- augment(knn_final_fit, new_data = loan_test) %>%
roc_auc(Loan_Status, estimate = .pred_Yes) %>%
select(.estimate)
knn10_roc_auc First, let’s view a confusion matrix:
knn_test_results <- augment(knn_final_fit, new_data = loan_test)
knn_test_results %>%
conf_mat(truth = Loan_Status, estimate = .pred_class) %>%
autoplot(type = "heatmap")knn_test_results %>%
roc_curve(Loan_Status, .pred_Yes) %>%
autoplot()
Typically, the more the ROC curve resembles the top left angle of a
square, the better the AUC. While our curve is not perfect, it still has
the general correct shape and looks pretty decent. As a reminder,
“sensitivity” is another term for true positive rate, and 1-specificity
is another name for the false positive rate.
Here’s a distribution of the predicted probabilities:
knn_test_results %>%
ggplot(aes(x = .pred_Yes, fill = Loan_Status)) +
geom_histogram(position = "dodge") + theme_bw() +
xlab("Probability of Yes") +
scale_fill_manual(values = c("blue", "orange"))As previously mentioned, we will also explore the results of the
decision tree and QDA models on our testing data. We will start with the
decision tree. First, let’s compute its roc_auc score and
then create visualization as needed.
The decision tree actually performed slightly better than the KNN model, although a trivial difference (0.001). This indicates that a decision tree also may not be the best choice.
dt_roc_auc <- augment(dt_final_fit, new_data = loan_test, type = 'prob') %>%
roc_auc(Loan_Status, .pred_Yes) %>%
select(.estimate)
dt_roc_aucThe decision tree’s ROC curve illustrates that, as FPR increases, the model performs better. In fact, it actually performs great once FPR exceeds 0.5. Recall that FPR is 1-specificity; the specificity of a classifier represents its ability to classify a false observation as negative, while the sensitivity of a classifier is its ability to designate a true observation as positive. As FPR increases, specificity falls and, as indicated by the graph, sensitivity increases. For our classifier, sacraficing specitifity to boost sensitivity is the right choice.
dt_roc_curve <- augment(dt_final_fit, new_data = loan_test, type = 'prob') %>%
roc_curve(Loan_Status, .pred_Yes) %>%
autoplot()
dt_roc_curveNow, it’s time to analyze our quadratic discriminant analysis (QDA) classifier. In short, it’s a more advanced version of a LDA model used to find a non-linear decision boundaries between classifiers, assuming each class follows a Gaussian distribution.
To my surprise, the QDA model is actually the most effective
classifier out of our best 3 models! The computed roc_auc
score is only slightly higher than that of the KNN and decision tree
model. Nevertheless, a 0.02 point increase is very significant when it
comes to AUC.
qda_roc_auc <- augment(qda_fit, new_data = loan_test, type = 'prob') %>%
roc_auc(Loan_Status, .pred_Yes) %>%
select(.estimate)
qda_roc_aucTo visualize this result, let’s plot a ROC curve:
augment(qda_fit, new_data = loan_test, type = 'prob') %>%
roc_curve(Loan_Status, .pred_Yes) %>%
autoplot()Instead of fluctuating between concavity and convexity (in the case of KNN), or looking flat then curve up after a specific FPR rate (in the case of decision tree), the QDA model’s ROC curve is consistently concave and looks the best out of the 3 models considered.
To recap, in this project, we tackled the problem of predicting the loan status of an applicant given select demographics specified in their application. We had a relatively small dataset and a large number of features that were mostly unrelated. We split the datset into a training set and a testing set, and fit a number of models of varying complexity and flexibility. Through analysis, testing, and assessment, we found the KNN model to be most optimal for predicting the loan status of a given applicant. However, the model was not perfect and leaves room for improvment.
In fact, none of our models performed particularly well for this problem. This can be due to a variety of factors, with the most likely culprits being a violation of assumptions or overfitting. None of the models that we considered are particularly robust in preventing overfitting (with the exception of the Ridge model). The Logistic model assumes that the data is linearly seperable, performs poorly when relationships are non-linear, and also tends to overfit in higher-dimensions. The LDA model assumes a linear decision boundary, and is especially prone to error in higher dimensions. With Elastic Net (which we selected to be a Ridge model), variance is reduced but bias is increased, perhaps a little too much in our case as it was the worst fit to the training data. The QDA model, while an improvement from LDA/Logistic, assumes a quadratic decision boundary and is not as flexible as KNN or decision tree; for more complex decision boundaries, however, a non-parametric approach may be preferred. The decision tree, a simpler model, is highly variant and tends to overfit. KNN doesn’t require linear separability and makes no assumption about the distribution of the data; however, it does not model relationships very well and is also prone to overfitting.
Given that the KNN and QDA models outperformed most of our models in testing, and that models with linear decision boundaries performed poorly (ie., LDA, Logistic Regression), we can say that the relationships in our data are likely non-linear. A potential improvement would be to consider more non-linear models or non-linear extensions to some of our models. Given that the decision tree also performed well in testing, non-parametric approaches may be the way to go.
As far as our error metric roc_auc goes, the QDA model
performed better than the KNN model on the test set, whereas the KNN
model was the best fit for the training set. Recall that KNN suffer from
the curse of dimensionality, and QDA is not a dimensionality reducer
either. This indicates that a more flexible approach may be suitable for
our data. A random forest may improve prediction accuracy due to its
removal of redundant features and noise; less misleading data means an
improvement in model accuracy.
It’s good to acknowledge, however, that none of our models performed particularly poorly either in the face of a complex problem. Though it may seem systematic, the loan eligibility of a customer is very random and prone to outside influence / noise. Some human attributes, such as an applicant’s background or motive for applying, are truly unquantifiable. With this understanding, assigning a class label to each applicant based on a few demographics seems unfair. Instead, applicants should be assessed on a case-by-case basis by a professional.
Nevertheless, I enjoyed taking on this challange, and to be able to say that I predicted loan eligibility somewhat accurately is a huge accomplishment. This project was a great opportunity to build my skills with machine learning techniques and engage meaningfully with the course material. Moving forward, I would like to experiment with a few more models that I did not get the chance to build due to time constraints. I’d also like to explore some different visualization packages, and develop the ability to select the right ones to use for future problems.